Preprocessing¶

In [78]:
using CSV, DataFrames, Statistics, Plots, CategoricalArrays, Dates, Random, Gurobi, JuMP
In [75]:
data = CSV.read("MLOpt_data.csv", DataFrame)

y = data."No-show"

X = select(data, [:Gender_binary, :Age_std, :Scholarship, :Hipertension, :Diabetes, :Alcoholism, :Handcap, :SMS_received, :lead_time_std, :day_of_week, :appointment_month])
X_features = select(data, [:Gender_binary, :Age_std, :Scholarship, :Hipertension, :Diabetes, :Alcoholism, :Handcap, :SMS_received, :lead_time_std, :day_of_week, :appointment_month, :Neighbourhood])
110458×12 DataFrame
110433 rows omitted
RowGender_binaryAge_stdScholarshipHipertensionDiabetesAlcoholismHandcapSMS_receivedlead_time_stdday_of_weekappointment_monthNeighbourhood
Int64Float64Int64Int64Int64Int64Int64Int64Float64String15String7String31
111.07858010000-0.667653FridayAprilJARDIM DA PENHA
200.818894000000-0.667653FridayAprilJARDIM DA PENHA
311.07858000000-0.667653FridayAprilMATA DA PRAIA
41-1.25863000000-0.667653FridayAprilPONTAL DE CAMBURI
510.818894011000-0.667653FridayAprilJARDIM DA PENHA
611.68453010000-0.536562FridayAprilREPÚBLICA
71-0.6094000000-0.536562FridayAprilGOIABEIRAS
810.0831057000000-0.536562FridayAprilGOIABEIRAS
91-0.695964000000-0.667653FridayAprilANDORINHAS
101-0.782527000000-0.536562FridayAprilCONQUISTA
111-0.306429000000-0.536562FridayAprilNOVA PALESTINA
120-0.349711000001-0.471016FridayAprilNOVA PALESTINA
131-0.652682100000-0.602107FridayAprilNOVA PALESTINA
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
1104470-0.176584010000-0.536562WednesdayJuneMARIA ORTIZ
1104481-0.00345758000000-0.602107WednesdayJuneMARIA ORTIZ
1104491-0.782527000000-0.667653TuesdayJuneMARIA ORTIZ
11045010.5592040000012.01971TuesdayJuneMARIA ORTIZ
1104511-0.6526820000012.01971TuesdayJuneMARIA ORTIZ
11045210.2129510000011.62644TuesdayJuneMARIA ORTIZ
11045310.6890490000011.62644TuesdayJuneMARIA ORTIZ
11045410.8188940000011.62644TuesdayJuneMARIA ORTIZ
11045510.6024850000011.62644TuesdayJuneMARIA ORTIZ
1104561-0.6959640000012.01971TuesdayJuneMARIA ORTIZ
11045710.03982410000012.01971TuesdayJuneMARIA ORTIZ
11045810.732330000012.01971TuesdayJuneMARIA ORTIZ
In [76]:
X.Gender_binary = CategoricalArray(X.Gender_binary)
X.Scholarship = CategoricalArray(X.Scholarship)
X.Hipertension = CategoricalArray(X.Hipertension)
X.Diabetes = CategoricalArray(X.Diabetes)
X.Alcoholism = CategoricalArray(X.Alcoholism)
X.Handcap = CategoricalArray(X.Handcap)
X.SMS_received = CategoricalArray(X.SMS_received)
X.day_of_week = CategoricalArray(X.day_of_week)
X.appointment_month = CategoricalArray(X.appointment_month)

X_features.Gender_binary = CategoricalArray(X_features.Gender_binary)
X_features.Scholarship = CategoricalArray(X_features.Scholarship)
X_features.Hipertension = CategoricalArray(X_features.Hipertension)
X_features.Diabetes = CategoricalArray(X_features.Diabetes)
X_features.Alcoholism = CategoricalArray(X_features.Alcoholism)
X_features.Handcap = CategoricalArray(X_features.Handcap)
X_features.SMS_received = CategoricalArray(X_features.SMS_received)
X_features.day_of_week = CategoricalArray(X_features.day_of_week)
X_features.appointment_month = CategoricalArray(X_features.appointment_month)
X_features.Neighbourhood = CategoricalArray(X_features.Neighbourhood)
110458-element CategoricalArray{String31,1,UInt32}:
 String31("JARDIM DA PENHA")
 String31("JARDIM DA PENHA")
 String31("MATA DA PRAIA")
 String31("PONTAL DE CAMBURI")
 String31("JARDIM DA PENHA")
 String31("REPÚBLICA")
 String31("GOIABEIRAS")
 String31("GOIABEIRAS")
 String31("ANDORINHAS")
 String31("CONQUISTA")
 ⋮
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
 String31("MARIA ORTIZ")
In [77]:
seed = 15095
(X_trainvalid, y_trainvalid), (X_test, y_test) = IAI.split_data(:classification, X, y, seed=seed, train_proportion=0.8)
(X_train, y_train), (X_valid, y_valid) =  IAI.split_data(:classification, X_trainvalid, y_trainvalid, seed=seed, train_proportion=0.8);

(Xfeat_trainvalid, yfeat_trainvalid), (Xfeat_test, yfeat_test) = IAI.split_data(:classification, X, y, seed=seed, train_proportion=0.8)
(Xfeat_train, yfeat_train), (Xfeat_valid, yfeat_valid) =  IAI.split_data(:classification, Xfeat_trainvalid, yfeat_trainvalid, seed=seed, train_proportion=0.8);

Feature Selection¶

In [ ]:
grid = IAI.GridSearch(
    IAI.OptimalFeatureSelectionClassifier(
        random_seed=seed,
    ),
    sparsity=1:12,
)
IAI.fit!(grid, Xfeat_trainvalid, Array(yfeat_trainvalid), validation_criterion=:auc)
All Grid Results:

 Row │ sparsity  train_score  valid_score  rank_valid_score 
     │ Int64     Float64      Float64      Int64            
─────┼──────────────────────────────────────────────────────
   1 │        1    0.0309003     0.69331                  1
   2 │        2    0.0353047     0.656788                12
   3 │        3    0.038904      0.65837                 10
   4 │        4    0.040399      0.659949                 4
   5 │        5    0.0415989     0.658556                 7
   6 │        6    0.0421186     0.658296                11
   7 │        7    0.0426327     0.658424                 9
   8 │        8    0.0429795     0.65862                  6
   9 │        9    0.0431705     0.658543                 8
  10 │       10    0.0435749     0.659456                 5
  11 │       11    0.0439266     0.661078                 3
  12 │       12    0.0442522     0.661305                 2

Best Params:
  sparsity => 1

Best Model - Fitted OptimalFeatureSelectionClassifier:
  Constant: -1.38275
  Weights:
    lead_time_std:  0.384818
  (Higher score indicates stronger prediction for class `1`)
In [66]:
plot(grid, type=:validation)
No description has been provided for this image
In [67]:
IAI.variable_importance(IAI.get_learner(grid))
98×2 DataFrame
73 rows omitted
RowFeatureImportance
SymbolFloat64
1lead_time_std1.0
2Age_std0.0
3Alcoholism=10.0
4Diabetes=10.0
5Gender_binary=10.0
6Handcap=00.0
7Handcap=10.0
8Handcap=20.0
9Handcap=30.0
10Handcap=40.0
11Hipertension=10.0
12Neighbourhood=ANDORINHAS0.0
13Neighbourhood=ANTÔNIO HONÓRIO0.0
⋮⋮⋮
87Neighbourhood=VILA RUBIM0.0
88SMS_received=10.0
89Scholarship=10.0
90appointment_month=April0.0
91appointment_month=June0.0
92appointment_month=May0.0
93day_of_week=Friday0.0
94day_of_week=Monday0.0
95day_of_week=Saturday0.0
96day_of_week=Thursday0.0
97day_of_week=Tuesday0.0
98day_of_week=Wednesday0.0

Predictions¶

RF¶

In [5]:
# TODO Train a Random Forest 
# Grid search for hyperparameter tuning
max_depths = [15]
num_trees = [200]
minbuckets = [100, 200]
rf_grid = IAI.GridSearch(
    IAI.RandomForestClassifier(
        random_seed=seed,
        max_categoric_levels_before_warning=10000,
    ),
    max_depth=max_depths,
    num_trees=num_trees,
    minbucket=minbuckets,
)
# Fit training data
IAI.fit_cv!(rf_grid, X_trainvalid, Array(y_trainvalid), validation_criterion=:auc, n_folds=3)


# Calculate AUC score on the validation set (name the variable holding the AUC score as `val_auc`)
val_auc = IAI.score(rf_grid, X_valid, y_valid, criterion=:auc)
┌ Warning: This copy of Interpretable AI software is for academic purposes only and not for commercial use.
└ @ nothing nothing:nothing
┌ Warning: Interpretable AI license expires soon: 2025-12-31T00:00:00. If you need to renew, please send us the following machine ID:
│ 1a2559dd14033a19804c4627f90f114af9c4fbd0d9ee19e3281af640f42d83a2
└ @ nothing nothing:nothing
0.7589772832042827
In [6]:
# TODO: Get best model
model = rf_grid
lnr_rf = IAI.get_learner(model)

# TODO: Display selected model's parameters
params = IAI.get_params(lnr_rf)
println(params)
println("Max depth of the best model is ", params[:max_depth])
println("Number of trees of the best model is ", params[:num_trees])
println("Minbucket size of the best model is ", params[:minbucket])
Dict{Symbol, Any}(:fast_num_support_restarts => 0, :max_categoric_levels_before_warning => 10000, :split_features => All{Tuple{}}(()), :fast_test_intermediate_support => true, :num_trees => 200, :parallel_processes => nothing, :num_threads => nothing, :max_features => :auto, :show_progress => false, :checkpoint_max_files => nothing, :max_depth => 15, :hyperplane_config => NamedTuple[], :checkpoint_interval => 10, :regression_features => Any[], :weighted_minbucket => 1, :treat_unknown_level_missing => false, :random_seed => 15095, :checkpoint_dir => nothing, :fast_cumulative_support => true, :missingdatamode => :none, :refit_learner => nothing, :checkpoint_file => nothing, :fast_num_support_iterations => 1, :normalize_X => true, :bootstrap_samples => true, :criterion => :gini, :minbucket => 100, :cp_tuning_se_tolerance => 0.0, :cp => 0)
Max depth of the best model is 15
Number of trees of the best model is 200
Minbucket size of the best model is 100
In [7]:
# TODO: Calculate AUC score for test data
auc_score_rf = IAI.score(model, X_test, y_test, criterion=:auc)

# Get class predictions for test set
acc_rf = IAI.score(model, X_test, y_test, criterion=:misclassification)

println("Test AUC score: ", round(auc_score_rf, digits=4))
println("Test Accuracy: ", round(acc_rf, digits=4))
Test AUC score: 0.7331
Test Accuracy: 0.7978

CART¶

In [8]:
max_depths = [12,15,18]
minbuckets = [100,200]

grid_cart = IAI.GridSearch(
    IAI.OptimalTreeClassifier(
        random_seed=seed,
        localsearch = false,
    ),
    max_depth=max_depths,
    minbucket=minbuckets,
    criterion=:gini
)

IAI.fit_cv!(grid_cart, X_trainvalid, Array(y_trainvalid), validation_criterion=:auc, n_folds=3)
Optimal Trees Visualization
In [9]:
# TODO: Get best model
model = grid_cart
lnr_cart = IAI.get_learner(model)

# TODO: Display selected model's parameters
params = IAI.get_params(lnr_cart)
println(params)
println("Max depth of the best model is ", params[:max_depth])
println("Minbucket size of the best model is ", params[:minbucket])
Dict{Symbol, Any}(:localsearch => false, :fast_num_support_restarts => 0, :max_categoric_levels_before_warning => 10, :split_features => All{Tuple{}}(()), :fast_test_intermediate_support => true, :ls_ignore_errors => false, :ls_num_hyper_restarts => 5, :max_depth => 12, :parallel_processes => nothing, :num_threads => nothing, :hyperplane_config => NamedTuple[], :show_progress => false, :ls_warmstart_criterion => :gini, :checkpoint_interval => 10, :ls_max_hyper_iterations => 9223372036854775807, :regression_features => Any[], :checkpoint_max_files => nothing, :weighted_minbucket => 1, :ls_bootstrap_samples => false, :treat_unknown_level_missing => false, :random_seed => 15095, :checkpoint_dir => nothing, :ls_num_greedy_features => :auto, :fast_cumulative_support => true, :missingdatamode => :none, :refit_learner => nothing, :fast_num_support_iterations => 1, :checkpoint_file => nothing, :normalize_X => true, :ls_num_categoric_restarts => 10, :criterion => :gini, :minbucket => 200, :ls_max_search_iterations => 9223372036854775807, :ls_scan_reverse_split => false, :ls_num_tree_restarts => 100, :cp_tuning_se_tolerance => 0.0, :cp => 0.0002150480321117902)
Max depth of the best model is 12
Minbucket size of the best model is 200
In [10]:
# TODO: Calculate AUC score for test data
auc_score_cart = IAI.score(model, X_test, y_test, criterion=:auc)

# Get class predictions for test set
acc_cart = IAI.score(model, X_test, y_test, criterion=:misclassification)

println("Test AUC score: ", round(auc_score_cart, digits=4))
println("Test Accuracy: ", round(acc_cart, digits=4))
Test AUC score: 0.724
Test Accuracy: 0.7981

OCT¶

In [11]:
# Train an OCT, this grid search might take some time
# Grid search for hyperparameter tuning
# TODO: 
max_depths = [4,6,8]
minbuckets = [100, 200]

grid_oct = IAI.GridSearch(
    IAI.OptimalTreeClassifier(
        random_seed=seed,
        max_categoric_levels_before_warning=10000,
    ),
    max_depth=max_depths,
    minbucket=minbuckets,
    criterion=:gini,
) 

IAI.fit_cv!(grid_oct, X_trainvalid, Array(y_trainvalid), validation_criterion=:auc, n_folds=3)
Optimal Trees Visualization
In [12]:
# TODO: Get best model
model = grid_oct
lnr_oct = IAI.get_learner(model)

# TODO: Display selected model's parameters
params = IAI.get_params(lnr_oct)
println(params)
println("Max depth of the best model is ", params[:max_depth])
println("Minbucket size of the best model is ", params[:minbucket])
Dict{Symbol, Any}(:localsearch => true, :fast_num_support_restarts => 0, :max_categoric_levels_before_warning => 10000, :split_features => All{Tuple{}}(()), :fast_test_intermediate_support => true, :ls_ignore_errors => false, :ls_num_hyper_restarts => 5, :max_depth => 6, :parallel_processes => nothing, :num_threads => nothing, :hyperplane_config => NamedTuple[], :show_progress => false, :ls_warmstart_criterion => :gini, :checkpoint_interval => 10, :ls_max_hyper_iterations => 9223372036854775807, :regression_features => Any[], :checkpoint_max_files => nothing, :weighted_minbucket => 1, :ls_bootstrap_samples => false, :treat_unknown_level_missing => false, :random_seed => 15095, :checkpoint_dir => nothing, :ls_num_greedy_features => :auto, :fast_cumulative_support => true, :missingdatamode => :none, :refit_learner => nothing, :fast_num_support_iterations => 1, :checkpoint_file => nothing, :normalize_X => true, :ls_num_categoric_restarts => 10, :criterion => :gini, :minbucket => 200, :ls_max_search_iterations => 9223372036854775807, :ls_scan_reverse_split => false, :ls_num_tree_restarts => 100, :cp_tuning_se_tolerance => 0.0, :cp => 1.3361958865643064e-5)
Max depth of the best model is 6
Minbucket size of the best model is 200
In [13]:
# TODO: Calculate AUC score for test data
auc_score_oct = IAI.score(model, X_test, y_test, criterion=:auc)

# Get class predictions for test set
acc_oct = IAI.score(model, X_test, y_test, criterion=:misclassification)

println("Test AUC score: ", round(auc_score_oct, digits=4))
println("Test Accuracy: ", round(acc_oct, digits=4))
Test AUC score: 0.7253
Test Accuracy: 0.7981

Prescription¶

In [ ]:
"""
    prescriptive_policy_from_model(lnr, X_test)

Given a trained IAI classifier `lnr` (RF, CART, or OCT) and a test
feature matrix `X_test`, compute:

  - optimal overbooking decisions z ∈ {0,1} per test point
  - expected cost under baseline (z = 0)
  - expected cost under the prescriptive policy
  - cost savings per slot
  - expected patients seen per slot (baseline and policy)
  - extra patients seen per slot

Assumes labels were originally encoded as:
  y = 1 (no-show), y = 0 (show)
"""
function prescriptive_policy_from_model(lnr, X_test;
                                        c_idle::Float64 = c_idle,
                                        c_delay::Float64 = c_delay)

    # Predicted probabilities from the model
    # proba[:,1] = P(y=0), proba[:,2] = P(y=1)
    proba = IAI.predict_proba(lnr, X_test)

    # P(no-show) = proba[:,2], P(show) = 1 - P(no-show)
    p_no   = proba[:, 2]
    p_show = 1 .- p_no

    n = length(p_show)

    # Decision and metrics
    z              = Vector{Int}(undef, n)     # 0 = no overbook, 1 = overbook
    cost_baseline  = Vector{Float64}(undef, n)
    cost_policy    = Vector{Float64}(undef, n)
    seen_baseline  = Vector{Float64}(undef, n)
    seen_policy    = Vector{Float64}(undef, n)

    for i in 1:n
        p = p_show[i]

        # expected cost from the prescriptive ML cost c(z; y_i)
        #   cost0 = 150 * P(no-show)   = 150 * (1 - p)
        #   cost1 = 75  * P(show)      = 75  * p
        cost0 = c_idle  * (1 - p)
        cost1 = c_delay * p

        # pick the cheaper decision
        if cost1 < cost0
            z[i] = 1
            cost_policy[i] = cost1
        else
            z[i] = 0
            cost_policy[i] = cost0
        end

        # baseline always uses z = 0
        cost_baseline[i] = cost0

        # expected patients seen:
        # baseline: book 1 → E[seen] = p
        seen_baseline[i] = p

        # policy: book b = 1 + z[i] → E[seen] = 1 - (1 - p)^b
        b = 1 + z[i]
        seen_policy[i] = 1 - (1 - p)^b
    end

    avg_cost_baseline = mean(cost_baseline)
    avg_cost_policy   = mean(cost_policy)
    cost_savings      = avg_cost_baseline - avg_cost_policy

    avg_seen_baseline = mean(seen_baseline)
    avg_seen_policy   = mean(seen_policy)
    extra_seen        = avg_seen_policy - avg_seen_baseline

    return (
        z = z,
        avg_cost_baseline = avg_cost_baseline,
        avg_cost_policy   = avg_cost_policy,
        cost_savings      = cost_savings,
        avg_seen_baseline = avg_seen_baseline,
        avg_seen_policy   = avg_seen_policy,
        extra_seen        = extra_seen,
    )
end
prescriptive_policy_from_model
In [54]:
res_oct  = prescriptive_policy_from_model(lnr_oct,  X_test)
res_cart = prescriptive_policy_from_model(lnr_cart, X_test)
res_rf   = prescriptive_policy_from_model(lnr_rf,   X_test)

println("=== Prescriptive Results (Expected Cost, BRL per slot) ===")
println("OCT:   baseline = ", round(res_oct.avg_cost_baseline,  digits=2),
        ", policy = ",        round(res_oct.avg_cost_policy,    digits=2),
        ", savings = ",       round(res_oct.cost_savings,       digits=2))

println("CART:  baseline = ", round(res_cart.avg_cost_baseline, digits=2),
        ", policy = ",        round(res_cart.avg_cost_policy,   digits=2),
        ", savings = ",       round(res_cart.cost_savings,      digits=2))

println("RF:    baseline = ", round(res_rf.avg_cost_baseline,   digits=2),
        ", policy = ",        round(res_rf.avg_cost_policy,     digits=2),
        ", savings = ",       round(res_rf.cost_savings,        digits=2))

println("\n=== Expected Patients Seen per Slot ===")
println("OCT:   baseline = ", round(res_oct.avg_seen_baseline,  digits=4),
        ", policy = ",        round(res_oct.avg_seen_policy,    digits=4),
        ", extra = ",         round(res_oct.extra_seen,         digits=4))

println("CART:  baseline = ", round(res_cart.avg_seen_baseline, digits=4),
        ", policy = ",        round(res_cart.avg_seen_policy,   digits=4),
        ", extra = ",         round(res_cart.extra_seen,        digits=4))

println("RF:    baseline = ", round(res_rf.avg_seen_baseline,   digits=4),
        ", policy = ",        round(res_rf.avg_seen_policy,     digits=4),
        ", extra = ",         round(res_rf.extra_seen,          digits=4))
=== Prescriptive Results (Expected Cost, BRL per slot) ===
OCT:   baseline = 30.24, policy = 28.03, savings = 2.21
CART:  baseline = 30.34, policy = 28.56, savings = 1.78
RF:    baseline = 30.32, policy = 28.44, savings = 1.88

=== Expected Patients Seen per Slot ===
OCT:   baseline = 0.7984, policy = 0.8429, extra = 0.0445
CART:  baseline = 0.7977, policy = 0.8388, extra = 0.041
RF:    baseline = 0.7978, policy = 0.8367, extra = 0.0388

try 3

In [ ]:
const C_IDLE  = 150.0
const C_DELAY = 75.0

# y_train: 1 = no-show, 0 = show
function leaf_costs_from_training_tree(lnr::IAI.TreeLearner,
                                       X_train,
                                       y_train;
                                       c_idle::Float64 = C_IDLE,
                                       c_delay::Float64 = C_DELAY)

    node_inds = IAI.apply_nodes(lnr, X_train)
    num_nodes = length(node_inds)

    leaf_cost0  = zeros(Float64, num_nodes)
    leaf_cost1  = zeros(Float64, num_nodes)
    leaf_p_show = zeros(Float64, num_nodes)

    for t in 1:num_nodes
        inds = node_inds[t]
        if isempty(inds) || !IAI.is_leaf(lnr, t)
            continue
        end

        n_leaf = length(inds)
        num_no   = sum(y_train[inds] .== 1)
        num_show = n_leaf - num_no

        frac_no   = num_no   / n_leaf
        frac_show = num_show / n_leaf

        leaf_cost0[t]  = c_idle  * frac_no
        leaf_cost1[t]  = c_delay * frac_show
        leaf_p_show[t] = frac_show
    end

    return leaf_cost0, leaf_cost1, leaf_p_show
end
leaf_costs_from_training_tree (generic function with 1 method)
In [ ]:
function prescribe_tree_with_gurobi(lnr::IAI.TreeLearner,
                                    X_train,
                                    y_train,
                                    X_test;
                                    c_idle::Float64 = C_IDLE,
                                    c_delay::Float64 = C_DELAY)

    leaf_cost0, leaf_cost1, leaf_p_show =
        leaf_costs_from_training_tree(lnr, X_train, y_train;
                                      c_idle=c_idle, c_delay=c_delay)

    leaf_test = IAI.apply(lnr, X_test)
    n_test = length(leaf_test)

    model = Model(Gurobi.Optimizer)
    set_silent(model)

    @variable(model, z[1:n_test], Bin)   

    @objective(model, Min,
        sum( leaf_cost0[leaf_test[j]] * (1 - z[j]) +
             leaf_cost1[leaf_test[j]] * z[j]
             for j in 1:n_test)
    )

    optimize!(model)

    z_opt = Int.(round.(value.(z)))

    cost_base = [leaf_cost0[leaf_test[j]] for j in 1:n_test]
    cost_pol  = [leaf_cost0[leaf_test[j]] * (1 - z_opt[j]) +
                 leaf_cost1[leaf_test[j]] * z_opt[j]
                 for j in 1:n_test]

    avg_cost_base = mean(cost_base)
    avg_cost_pol  = mean(cost_pol)
    savings       = avg_cost_base - avg_cost_pol

    seen_base = Float64[]
    seen_pol  = Float64[]
    for j in 1:n_test
        p = leaf_p_show[leaf_test[j]]    
        push!(seen_base, p)              
        b = 1 + z_opt[j]                 
        push!(seen_pol, 1 - (1 - p)^b)   
    end

    avg_seen_base = mean(seen_base)
    avg_seen_pol  = mean(seen_pol)
    extra_seen    = avg_seen_pol - avg_seen_base

    return (
        z_opt         = z_opt,
        avg_cost_base = avg_cost_base,
        avg_cost_pol  = avg_cost_pol,
        savings       = savings,
        avg_seen_base = avg_seen_base,
        avg_seen_pol  = avg_seen_pol,
        extra_seen    = extra_seen,
    )
end
prescribe_tree_with_gurobi (generic function with 1 method)
In [ ]:
function prescribe_rf_with_gurobi(rf_lnr,
                                  X_test;
                                  c_idle::Float64 = C_IDLE,
                                  c_delay::Float64 = C_DELAY)

    proba = IAI.predict_proba(rf_lnr, X_test)
    p_no   = proba[:, 2]           
    p_show = 1 .- p_no             

    n_test = length(p_show)

    cost0 = c_idle  .* (1 .- p_show)
    cost1 = c_delay .* p_show

    model = Model(Gurobi.Optimizer)
    set_silent(model)

    @variable(model, z[1:n_test], Bin)

    @objective(model, Min,
        sum(cost0[j] * (1 - z[j]) + cost1[j] * z[j] for j in 1:n_test)
    )

    optimize!(model)

    z_opt = Int.(round.(value.(z)))

    cost_base = cost0                 
    cost_pol  = [cost0[j] * (1 - z_opt[j]) + cost1[j] * z_opt[j]
                 for j in 1:n_test]

    avg_cost_base = mean(cost_base)
    avg_cost_pol  = mean(cost_pol)
    savings       = avg_cost_base - avg_cost_pol

    seen_base = p_show                            
    seen_pol  = [1 - (1 - p_show[j])^(1 + z_opt[j]) for j in 1:n_test]

    avg_seen_base = mean(seen_base)
    avg_seen_pol  = mean(seen_pol)
    extra_seen    = avg_seen_pol - avg_seen_base

    return (
        z_opt         = z_opt,
        avg_cost_base = avg_cost_base,
        avg_cost_pol  = avg_cost_pol,
        savings       = savings,
        avg_seen_base = avg_seen_base,
        avg_seen_pol  = avg_seen_pol,
        extra_seen    = extra_seen,
    )
end
prescribe_rf_with_gurobi (generic function with 1 method)
In [ ]:
# Best learners from your grids
oct_lnr  =lnr_oct
cart_lnr =lnr_cart
rf_lnr   =lnr_rf

res_oct  = prescribe_tree_with_gurobi(oct_lnr,  X_train, y_train, X_test)
res_cart = prescribe_tree_with_gurobi(cart_lnr, X_train, y_train, X_test)
res_rf   = prescribe_rf_with_gurobi(rf_lnr, X_test)

println("=== Expected Cost (BRL per slot) ===")
println("OCT:   baseline = ", round(res_oct.avg_cost_base,  digits=2),
        ", policy = ",        round(res_oct.avg_cost_pol,   digits=2),
        ", savings = ",       round(res_oct.savings,        digits=2))

println("CART:  baseline = ", round(res_cart.avg_cost_base, digits=2),
        ", policy = ",        round(res_cart.avg_cost_pol,  digits=2),
        ", savings = ",       round(res_cart.savings,       digits=2))

println("RF:    baseline = ", round(res_rf.avg_cost_base,   digits=2),
        ", policy = ",        round(res_rf.avg_cost_pol,    digits=2),
        ", savings = ",       round(res_rf.savings,         digits=2))

println("\n=== Expected Patients Seen per Slot ===")
println("OCT:   baseline = ", round(res_oct.avg_seen_base,  digits=4),
        ", policy = ",        round(res_oct.avg_seen_pol,   digits=4),
        ", extra = ",         round(res_oct.extra_seen,     digits=4))

println("CART:  baseline = ", round(res_cart.avg_seen_base, digits=4),
        ", policy = ",        round(res_cart.avg_seen_pol,  digits=4),
        ", extra = ",         round(res_cart.extra_seen,    digits=4))

println("RF:    baseline = ", round(res_rf.avg_seen_base,   digits=4),
        ", policy = ",        round(res_rf.avg_seen_pol,    digits=4),
        ", extra = ",         round(res_rf.extra_seen,      digits=4))
Set parameter Username
Set parameter LicenseID to value 2710905
Academic license - for non-commercial use only - expires 2026-09-20
Set parameter Username
Set parameter LicenseID to value 2710905
Academic license - for non-commercial use only - expires 2026-09-20
Set parameter Username
Set parameter LicenseID to value 2710905
Academic license - for non-commercial use only - expires 2026-09-20
=== Expected Cost (BRL per slot) ===
OCT:   baseline = 30.24, policy = 28.06, savings = 2.18
CART:  baseline = 30.35, policy = 28.63, savings = 1.72
RF:    baseline = 30.32, policy = 28.44, savings = 1.88

=== Expected Patients Seen per Slot ===
OCT:   baseline = 0.7984, policy = 0.8428, extra = 0.0444
CART:  baseline = 0.7977, policy = 0.8386, extra = 0.0409
RF:    baseline = 0.7978, policy = 0.8367, extra = 0.0388

Set parameter LicenseID to value 2710905
Academic license - for non-commercial use only - expires 2026-09-20
Set parameter Username
Set parameter LicenseID to value 2710905
Academic license - for non-commercial use only - expires 2026-09-20
Set parameter Username
Set parameter LicenseID to value 2710905
Academic license - for non-commercial use only - expires 2026-09-20
=== Expected Cost (BRL per slot) ===
OCT:   baseline = 30.24, policy = 28.06, savings = 2.18
CART:  baseline = 30.35, policy = 28.63, savings = 1.72
RF:    baseline = 30.32, policy = 28.44, savings = 1.88

=== Expected Patients Seen per Slot ===
OCT:   baseline = 0.7984, policy = 0.8428, extra = 0.0444
CART:  baseline = 0.7977, policy = 0.8386, extra = 0.0409
RF:    baseline = 0.7978, policy = 0.8367, extra = 0.0388